Fusing graph transformer with multi-aggregate GCN for enhanced drug–disease associations prediction

Background Identification of potential drug–disease associations is important for both the discovery of new indications for drugs and for the reduction of unknown adverse drug reactions. Exploring the potential links between drugs and diseases is crucial for advancing biomedical research and improving healthcare. While advanced computational techniques play a vital role in revealing the connections between drugs and diseases, current research still faces challenges in the process of mining potential relationships between drugs and diseases using heterogeneous network data. Results In this study, we propose a learning framework for fusing Graph Transformer Networks and multi-aggregate graph convolutional network to learn efficient heterogenous information graph representations for drug–disease association prediction, termed WMAGT. This method extensively harnesses the capabilities of a robust graph transformer, effectively modeling the local and global interactions of nodes by integrating a graph convolutional network and a graph transformer with self-attention mechanisms in its encoder. We first integrate drug–drug, drug–disease, and disease–disease networks to construct heterogeneous information graph. Multi-aggregate graph convolutional network and graph transformer are then used in conjunction with neural collaborative filtering module to integrate information from different domains into highly effective feature representation. Conclusions Rigorous cross-validation, ablation studies examined the robustness and effectiveness of the proposed method. Experimental results demonstrate that WMAGT outperforms other state-of-the-art methods in accurate drug–disease association prediction, which is beneficial for drug repositioning and drug safety research.

procedures.The average cost of bringing a new drug to market exceeds about 2 billion dollars and 10-15 years before it can reach the pharmacy shelf [1][2][3].Therefore, finding new indications for existing drugs, also known as drug repositioning, is an economically viable and time-saving strategy [4,5].Computational methods for drug repositioning facilitate the identification of potential drug-disease associations by screening largescale data sources, enabling more rational design of clinical trials.Such strategies can accelerate the drug discovery pipeline and increase the availability of new treatments [6].
The utilization of computational methods for drug repositioning in drug discovery has become widespread [7,8].During this time, a growing array of methodologies has emerged.For example, methods rooted in matrix decomposition, like the one by Cui et al. [9] utilize dual-network L 2,1 -collaborative matrix factorization for predicting novel drug-disease interactions.Fu et al. [10] introduced MFLDA, a method that decomposes heterogeneous data sources' matrices into low-rank forms using matrix tri-factorization, thus exploring and exploiting their inherent and shared structure.MFLDA facilitates the selection and integration of these data sources by assigning varying weights to each source.Matrix factorization diminishes data dimensionality by transforming matrices into low-rank structures, extracting crucial features and patterns.However, with extensive or densely high-dimensional matrices, its computational demands might become excessive, resulting in reduced efficacy in managing considerable noise.
Network-based drug repositioning models have emerged to confront this challenge, striving to capitalize on intricate relational networks among biological entities such as drugs and diseases.These models amalgamate varied information sources, encompassing protein-protein interaction networks, gene expression data, and drug compound information, to anticipate potential novel drug-disease associations.For instance, Zhang et al. [11] proposed NTSIM to predict unobserved drug-disease associations and extended it to NTSIM-C for classifying therapeutic associations.Zhang et al. [12] proposed SCMFDD, projecting drug-disease associations into two low-rank spaces, revealing latent features, and introducing feature-based similarity and semantic constraints.Lu et al. [13] proposed heterogeneous information network (HIN) based model, namely HINGRL.Zhou et al. [14] introduced NEDD, using varied-length metapaths to explicitly capture internal relationships within drugs and diseases and obtain low-dimensional representation vectors.Martínez et al. [15] developed DrugNet, a network-based method predicting new drug uses and treatments for diseases.It utilizes a heterogeneous network formed from disease, drug, and target information, identifying novel associations by information propagation.
Despite the commendable interpretability inherent in network-based methodologies, their performance is deemed unsatisfactory.However, the drug-disease association network naturally has a graph structure, which enables techniques that leverage graph neural networks to adeptly preserve essential information, eliminate noise, and extract pivotal patterns and features.Therefore, this enhances the accuracy of information available for prediction and analysis in graph-based data scenarios.Some methods have been proposed to exploit the advantages of graph neural networks for drug-disease association prediction.Yu et al. [16] proposed LAGCN, a method that integrates heterogeneous networks, employs graph convolutional operations, and incorporates an attention mechanism.Yang et al. [17] introduced a model that infers drug-disease associations by applying network-embedding algorithms alongside a random forest classification approach.Gu et al. [18] introduced REDDA, a heterogeneous graph neural network with three attention mechanisms for sequential drug disease representation learning.Wu et al. [19] proposed EMP-SVD, a novel framework that predicts drug-disease associations by integrating multiple meta-paths and singular value decomposition.Li et al. [20].proposed the NIMCGCN method, which integrates Graph Convolutional Networks (GCN) with Neural Inductive Matrix Completion (NIMC) models to discover associations between miRNA and diseases.VGAE [21].introduced by Kipf et al., is a graph neural network model based on the Variational Autoencoder (VAE) framework, which models node features through an encoder-decoder structure, mapping them to a latent space distribution.Meng et al. [22] leveraged deep learning within a heterogeneous network framework to identify potential drugs related to diseases.Their model, DRWBNCF, employs weighted bilinear graph convolution operations, intricately fusing information about drug-disease associations and drug-disease similarity networks.Tang et al. [23] proposed the DRGBCN model, which exploits the embedding of graph convolutional layers and local interactions between drugs and diseases, thus significantly improving the accuracy and reliability of predictions.Ghasemian et al. [24] applied meta-learning methods in network analysis to develop a stacked model that integrates complex prediction algorithms from various domains, effectively mitigating changes in link prediction.Additionally, many studies [25,26] have indicated that collaborative drug combination prediction is widely applied in drug repositioning.For instance, the SNRMPACDC model proposed by Li et al. [27] combines Siamese convolutional networks and random matrix projection to predict collaborative combinations of anticancer drugs.NLLSS [28] is a semi-supervised learning-based model that focuses on predicting collaborative drug combinations, enhancing the model's predictive performance through methods involving non-negative low-rank and sparse structures.They play a crucial role in revealing the associations between drugs and biomolecules, as well as in drug repositioning.
Although these methods can extract edge information through extensive informatics learning, they often struggle to fully mine the complex interactions between nodes in heterogeneous graphs, which may affect the accuracy of predictions.To address these challenges, we propose a heterogeneous information graph representation learning method for predicting drug-disease associations, named WMAGT, which utilizes weighted multi-aggregate graph convolutional network and graph transformer to exploit discriminative node representations.The workflow of the proposed method is demonstrated in Fig. 1.Specifically, the proposed WMAGT approach first integrates drug-drug similarity networks, disease-disease similarity networks, and validated drug-disease association networks to construct a comprehensive heterogeneous information network.Then, graph transformer combined with weighted multi-aggregate graph convolutional neural network are used to learn efficient characterizations of drugs and diseases from this heterogeneous information network.Prior to the predictor, we integrated domain embeddings and interaction embeddings through neural collaborative filtering for the final link prediction scoring.To evaluate the performance of the proposed method, we cross-validated the predictive performance of WMAGT on three benchmark datasets and compared it with four stateof-the-art methods while conducting ablation experiments.Experimental results provide strong evidence of the effectiveness of WMAGT in discovering drug indications, which is important for advancing drug repurposing and reducing adverse drug reactions in the field of drug discovery.The main contributions and advantages of WMAGT include: 1. Proposing a representation learning method based on heterogeneous information graphs, which fully utilizes the multi-source information of drugs and diseases and considers their multi-level relationships.

Adopting a representation learning framework of Graph Transformer and Weighted
Multi-Aggregation Graph Convolutional Neural Network, effectively eliminating the impact of heterogeneity and capturing relationships to learn more effective node representations.3. Demonstrating through experiments on three public datasets that this method has broad application prospects in the field of drug discovery.Our approach not only outperforms existing models in predictive performance but also shows significant improvement in understanding complex biological networks.

Methods
In this study, we propose a novel computational method, WMAGT, for drug repositioning, aiming to discover new indications for existing drugs by inferring potential drug-disease associations.First, we build a heterogeneous network that incorporates various types of relations in the data set, such as drug-drug similarity network, diseasedisease similarity network, and drug-disease association network.Then, we utilize an end-to-end model to learn the latent features of the network and predict the unknown associations.

Benchmark datasets
To explore heterogeneous network prediction methods for drug-disease associations, we utilized three publicly available real datasets to assess the efficacy of our model.The first dataset, Fdataset [29], comprises 313 diseases from the OMIM database [30] and 553 drugs from the DrugBank database [31], along with 1933 known associations between them.Another dataset, termed as Cdataset [32], consists of 663 drugs from the DrugBank database and 409 diseases from the OMIM database, encompassing 2532 established associations between drugs and diseases.The third dataset is LRSSL [33], which comprises 763 drugs from the DrugBank database, 681 diseases from the MeSH database, and a collection of 3051 validated associations between drugs and diseases.The essential statistical information of these three datasets is presented in Table 1.

The construction of heterogeneous information graph
To predict potential drug-disease associations, this research employed network analysis methods based on a known drug-disease association network denoted as G. G is represented by an n × m binary matrix A, where n and m represent the number of drugs and diseases, respectively.The matrix A ij holds a value of 1 or 0, indicating the presence or absence of an experimentally validated association between drug r i and disease d j .
Two additional similarity networks were constructed: a drug-drug similarity network G r and a disease-disease similarity network G d .These networks are represented by n × n and m × m matrices A r and A d , respectively.The values A r (i, j) and A d (i, j) represent the similarities between drug r i and drug r j , and between disease d i and disease d j , respectively.These similarities were computed based on various characteristics including chemical, pharmacological, therapeutic, phenotypic, genetic, and environmental properties of drugs or diseases.
To enhance accuracy and reduce noise, a k-nearest neighbor approach was employed.It considered only the k most similar neighbors for each drug or disease.The extended k-nearest neighbor sets of drugs or diseases, represented as Ñk , comprised the individual entities along with their k nearest neighbors.A r (i, j) and A d (i, j) illustrate the similarities among drugs or diseases, considering their extended k-nearest neighbor sets Ñk .Mathe- matical representation: (1) Considering two sets representing drugs (R) and diseases (D), where each r ∈ R and d ∈ D introduces an association label Y r,d signifying the presence Y r,d = 1 or absence Y r,d = 0 of an association between drug r and disease d.Consequently, inferring the association label Y r,d for a given drug r and disease d relies on known associations within the sets.The expression for the association label Y r,d remains defined as: This representation aims to establish the foundation of the drug repositioning problem, framing it as a task of predicting association labels.

Graph convolutional network module
In a drug-disease heterogenous graph, nodes represent various drugs and diseases.Typically, each node contains its own similarity information, and the edges connecting two nodes represent the relationship between them.We employ Graph Convolutional Networks (GCN) [34][35][36][37] to integrate node information, which usually consists of aggregation functions and update functions.Aggregation functions are applied to each node/ edge to gather information from their neighbors, while update functions generate new representations for each node/edge based on the collected information and the previous representation.The update function is defined as follows: Here, H (l) represents the input features at layer l in the GCN.H (l+1) signifies the output features at layer l + 1 after the convolution operation.σ is the activation function (commonly ReLU or Leaky ReLU).W (l) is the learnable weight matrix at layer.Ã = A + I n represents the adjacency matrix of the graph, where A is the original adjacency matrix, and I n is the identity matrix.D is the diagonal node degree matrix of Ã.

Node attentions in graph transformer module
Recently, the Transformer model has extended its application beyond the field of natural language processing to include a wide range of tasks, including link prediction.In the information integration module, we have incorporated both the graph transformer [38][39][40][41] and GCN, thereby enhancing the model's flexibility and performance.The two fundamental components of the Transformer are the dot-product attention mechanism and the feedforward network, playing crucial roles in link prediction tasks.
The graph attention formula is: where h (l) j is the i-th node's feature vector in layer l, W (l) is the layer's weight matrix, n is the graph size, and α (l) ij is the attention weight between nodes i and j in layer l, computed by (4) Y r,d = 1 if r is associated with d 0 otherwise (5) where e (l) ij is the similarity between nodes i and j in layer l, computed by: where a (l) is a differentiable similarity function in layer l, such as dot product, bilinear, multilayer perceptron, etc.
Layer-wise transformation At each layer of the Graph Transformer, the hidden states of nodes are updated using multi-head self-attention and feedforward neural networks.The transformation can be summarized as [42]: Aggregation across heads The outputs from multiple attention heads are aggregated to obtain the final node representations: where H represents the number of attention heads.

Overview of the proposed WMAGT model
As shown in Fig. 1, this section will delve into the detailed description of the model, delineating its architecture, methodologies employed, and the intricate components contributing to its predictive capability.

Graph representation learning with mixed aggregation parameters
In the context of learning node neighborhood information, a hybrid approach is employed utilizing mixed parameters, integrating two distinct graph convolution operations to acquire meaningful representations of graph data.The fundamental idea of hybrid parameters involves a weighted combination of the outputs of two graph convolution operations, thereby generating the final node features.here, α and β control the relative influence of the two aggregation methods.The Rectified Linear Unit function (ReLU) [43] is employed as activation function, while Pool represents a customized pooling operation involving specific manipulations of the adjacency matrix.This process involves the product of the adjacency matrix and node feature matrix, square operations, and some matrix operations.The entire operation can be expressed mathematically as follows: Here, A is the adjacency matrix of the graph, XW is the node feature matrix, and Z represents the new node representation matrix obtained after the graph pooling operation.(7) This operation introduces additional information into the graph structure, aiming to better capture the relationships between nodes.In general, the introduction of hybrid parameters imparts adaptability to the model, allowing it to determine the relative contributions of different graph convolution operations during the learning process and thus better adapt to diverse graph structures.

Computing weighted matrices for drug and disease nodes
To compute the weighted matrices for drugs and diseases, we utilize the input feature matrix X ∈ R N ×d , where N represents the number of nodes and d denotes the feature dimension.The weight matrix, denoted as W ∈ R d×d ′ , corresponds to the output feature dimension d′.The graph's adjacency matrix, A ∈ {0, 1} N ×N , demonstrates connections between nodes in a symmetric matrix form.The weighted feature matrix is obtained from this process, which can be represented as: the iterative graph convolution concludes with the normalization of the resulting feature matrix using a normalization matrix represented as: Furthermore, an element-wise addition of a bias term is performed to further refine the output matrix.

Compute the element-wise product of drug and disease embeddings
Given an input drug embedding matrix as D ∈ R N drug ×d , and a Disease Embedding matrix as E ∈ R N disease ×d , where N drug and N disease represent the quantities of drugs and diseases respectively, and d represents the embedding dimension.
Projection of Drug and Disease via Linear Mapping: this involves computing the element-wise product of drug and disease embeddings.For instance, let D ij represent the row and jth column element of matrix D, and E ij represent the ith row and jth column element of matrix E. The element-wise product P can be obtained as: The resulting matrix P captures the element-wise products of the drug and disease embeddings.This process facilitates the exploration of interactions between drugs and diseases within a feature space defined by their embeddings.
To normalize the association matrix P, L 2 normalization is applied row-wise post element-wise product computation.Each row's L 2 norm is computed, and its elements are divided by this norm, ensuring unit L 2 norm per row.The normalization formula is: , Where N denotes the column count, and the summation extends ( 12) over the row's elements.This yields a matrix P normij with standardized rows, enhancing association representation and minimizing dataset bias.

Neural collaborative filtering for drug and disease expression
Neural Collaborative Filtering (NCF) [44] is a neural network-based collaborative filtering algorithm designed for learning relationships between users and items for recommendation purposes.The implementation of NCF in our model involves key components: Neighbor Embedding Process, defines the neighbor embedding process, and integrating information from neighbors of drugs and diseases to better capture relationships between nodes.Interaction Embedding Process, defines the interaction information between drugs and diseases.This section mainly involves calculating interaction embedding through element-wise multiplication and normalization operations.Decoding Process, defines the decoder, transforming embedded node representations into final prediction scores.This process primarily involves linear transformations and non-linear activation functions.
In summary, in the forward method of the model, node embedding representations are first obtained through processes such as neighbor embedding and interaction embedding.Subsequently, the decoder yields the final prediction scores.The core idea of Neural Collaborative Filtering involves learning implicit relationships between drugs and diseases through processes such as embedding, neighbor embedding, interaction embedding, and decoding.

Neighbor-weighted interaction decoding
In this module, the descriptions of drug-disease associations, drug proximity, and disease proximity are amalgamated into a unified vector hr,d using the concatenation opera- tion ⊕ , defined as: Here, the operator ⊕ signifies concatenation, facilitating the formation of an encompassing representation that merges established associations with contextual information drawn from drug and disease proximities.
Subsequently, linear transformations and ReLU activation were utilized in processing the hidden layers.In each hidden layer i where i ranges from 1 to the length of hidden_ dims, the use of linear transformations and ReLU activation generated z i .Afterwards, at the output layer, linear transformations and Sigmoid activation were applied to handle the outputs from the hidden layer z len(hidden_dims) , producing the output y.The overall model output Y can be interpreted as probabilities for specific categories.

MLP-based prediction
The introduction of Multilayer Perceptron (MLP) is motivated by its ability to capture intricate nonlinear relationships, extract advanced features, manage sparse data, and exhibit a flexible architecture adaptable to various data traits.Within drug-disease association studies, integrating MLP aims to enhance the accurate prediction and interpretation of complex drug-disease associations, thereby providing deeper insights into correlation studies within the pharmaceutical domain.(17) hr,d = h r,d ⊕ h r,d ⊕ hr ⊕ hd Forward propagation in an MLP involves multiple layers, each with numerous neurons.Assuming inputs X, H neurons in the hidden layer, output Y, weight parameters W, and biases b, the forward propagation can be represented as:

Parameters setting
Hyperparameter settings are crucial for fine-tuning the neural collaborative filtering model, covering dimensions like node embedding, neighbor embedding, and decoder hidden layers.Specifically, the node embedding dimension is set at 64, the neighbor embedding dimension at 32, and the decoder hidden layer dimension is specified as (64,32).The learning rate is set to 5e − 4, and the dropout rate is 0.3.Additionally, a comprehensive set of loss functions is utilized, encompassing binary cross-entropy loss, focal loss, mean squared error loss, and ranking loss.For the focal loss, parameters are configured with α set to 0.5 and γ set to 2.0.The graph transformer network parameter is defined as λ = 0.8.Throughout the training process, a holistic consideration of these loss functions is conducted, aiming to comprehensively optimize the model.These configurations are designed to strike a balance between model complexity and performance, ensuring optimal predictive outcomes across diverse facets.
Loss Function Formula: Here, α controls the balance of weights between positive and negative samples, and γ regulates the focus of the focal loss.We use the Adam [45]

Evaluation metrics
We adopted six widely used indicators to measure the predictive performance of the proposed model, including accuracy (Acc), Area Under the Precision-Recall Curve (AUPR), Area Under the Receiver Operating Characteristic Curve (AUC), F1 score, Precision and Recall.Since AUPR and F1 are more sensitive to severe imbalances data.Micro metrics are used for AUPR and AUC, while macro metrics are used for other measurements.The definitions of these indicators can be described as follows: (18 where the TN, PN, FN and FP denote the number of correctly predicted positive and negative samples, wrongly predicted positive and negative samples, respectively.In addition, we use the Micro mode to calculate AUC and Recall, which treats each element of the label indicator matrix as a label.In contrast, F1 calculates each label in a Macro mode and finds their unweighted average.

Baseline methods
NIMCGCN [20].This study introduces a novel approach named Neural Inductive Matrix Completion with Graph Convolutional Network (NIMCGCN), amalgamating Graph Convolutional Networks (GCNs) and Neural Inductive Matrix Completion (NIMC) models to forecast the association between miRNAs and diseases.By optimizing parameters through supervised learning and demonstrating its superiority in prediction accuracy and forecasting new diseases during experimental validation, the method serves as an effective computational tool for swiftly identifying disease-associated miRNAs.
DRWBNCF [22].This study introduces a new method called DRWBNCF for drug repositioning, addressing limitations of traditional latent factor models.Leveraging deep learning techniques and a heterogeneous network framework, DRWBNCF infers potential drugs for diseases.By amalgamating drug-disease association information and drug-disease similarity networks, employing a weighted bilinear graph convolution operation, and utilizing a multi-layer perceptron combined with α-balanced focal loss function and graph regularization, DRWBNCF demonstrates effectiveness in predicting unknown drug-disease associations.
Ghasemian 's model [24].Ghasemian et al. employed a meta-learning approach within network analysis to devise a stacked model, amalgamating various sophisticated prediction algorithms.This approach successfully mitigated the variations observed in link prediction across diverse domains of networks.
VAGE [21].Variational Graph Auto-Encoders (VGAE) is a graph neural network model built upon the framework of Variational Autoencoders (VAE).VGAE integrates the encoder-decoder structure of VAE, modeling node features into latent space distributions and reconstructing them back to the original feature space.Key features include probabilistic modeling, representing node embeddings as Gaussian distributions using reparameterization techniques and KL divergence, while also considering graph structure through graph convolutional networks (GCNs) to efficiently capture local structural information.
DRGBCN [23].DRGBCN presents an approach that utilizes bilinear attention networks and local interactive learning to improve performance in drug repositioning tasks.Significant performance gains are achieved by emphasizing local association and deep learning applications in the medical domain.

Comparison of WMAGT and state-of-the-art methods under tenfold cross-validation
To evaluate the performance of the WMAGT model, we conducted extensive experiments on three benchmark datasets, comparing WMAGT with five state-of-the-art methods under tenfold cross-validation.Table 2, Figs.Fig. 4 The performance of WMAGT and other compared methods under tenfold cross-validation on LRSSL model 0.4617, VAGE 0.0526 and DRGBCN 0.3721.Average performance is a crucial indicator for assessing the overall effectiveness of models.In this regard, WMAGT demonstrated relatively superior average performance in both AUROC and AUPR, highlighting its effectiveness across various datasets.After statistical testing and analysis, the proposed WMAGT shows significant performance improvement compared to compared models.

Ablation study
In this section, we delve deeply into the far-reaching impacts of two pivotal modules on our experimental framework: • 'w/o Transformer': Our investigation goes beyond, scrutinizing the specific effects of excluding the transformer mechanism on model performance.This involves understanding how the model handles information, learns representations, and ultimately predicts drug-disease relationships.• 'w/o NCF': Further discourse is dedicated to the model's performance in the absence of collaborative filtering.This decision plays a crucial role in determining the model's effectiveness in handling user-item associations, particularly in our specific application scenario.
In WMAGT model, we employed a simplified approach, omitting the steps of neighbor embedding and interaction embedding, directly feeding the node representations obtained from the graph convolution module into the decoder.The rationale behind this decision and its implications on model performance necessitate a broader contextual understanding.The results of the ablation study in Fig. 5 showcase the consequences of these decisions.Notably, both the transformer module and the NCF module contribute significantly to enhancing the performance in predicting drug-disease relationships, with the NCF module being particularly noteworthy.This indicates that, when considering multiple embeddings and the transformer comprehensively, the model can more accurately capture latent relationships between drugs and diseases, thereby improving the predictive accuracy of drug-disease relationships.This finding provides profound insights for future model optimization and further research endeavors.

Case study
To assess the practical applicability of WMAGT, a case study was conducted with the aim of predicting drug candidates for Parkinson's disease.Specifically, the model was trained using all known drug-disease associations in the F dataset, and a descending order ranking was performed after obtaining the probabilities of all drug-disease associations.In this process, the top 10 drug candidates associated with Parkinson's disease were selected for in-depth investigation.Parkinson's disease is a chronic neurological disorder typically characterized by symptoms such as movement disorders, muscle stiffness, and tremors.The primary cause of this disease is the loss of dopamine-producing neurons in the brain, where dopamine functions as a neurotransmitter controlling movement.Currently, the treatment focus for Parkinson's disease primarily revolves around alleviating symptoms, and the exploration of new drug treatment directions has been a crucial area of scientific research.Encouragingly, the relevance of seven of these drugs was further confirmed by additional literature, as depicted in Table 3.This discovery not only enhances the reliability of our model but also indicates that WMAGT successfully identifies potential drug-disease pairs by learning multi-source information about drugs and diseases.

Conclusions
In this study, we propose a heterogenous information graph-based method for predicting drug-disease associations, named WMAGT.WMAGT innovatively integrates Graph Transformer Networks and Neural Collaborative Filtering, with a core improvement lying in the deep aggregation of local neighbors around nodes to enhance traditional Fig. 5 The performance of WMAGT and other variants under tenfold cross-validation on three benchmark datasets graph convolution operations.Simultaneously, the model autonomously learns to select weights for different types of convolutional networks, resulting in a significant performance improvement compared to a singular graph convolution network.Extensive experiments were conducted to thoroughly assess the performance and robustness of WMAGT.WMAGT exhibited superior performance on three benchmark datasets, better than other compared state-of-the-art models.Ablation studies further verified the importance of different modules introduced in the proposed framework.In addition, the case study show that WMAGT has high practical predictive power, e.g., in Parkinson's potential drug mining, 7 of the top 10 drugs we predicted have been relevantly demonstrated.This study not only introduces methodological refinements but also substantiates their feasibility and superiority through rigorous experimentation and empirical validation.It's anticipated that these results can serve as valuable references for fostering further drug development and disease treatment.

Fig. 1
Fig.1The overall architecture of the proposed WMAGT.WMAGT involves three main steps.First, drug and disease similarity networks are jointly encoded using GCN and graph transformer for representation projection.In the second step, matrix operations project drug and disease representations in the network, generating new information.Lastly, the domain information from the first step and interactive information from the second step are utilized in the NCF module, and multiple loss functions along with MLP are employed to comprehensively model the drug-disease relationship

Fig. 2
Fig.2The performance of WMAGT and other compared methods under tenfold cross-validation on Cdataset

Fig. 3
Fig.3The performance of WMAGT and other compared methods under tenfold cross-validation on Fdataset

Table 1
Details of the three benchmark datasets optimizer to update model parameters, ensuring efficient training.A cyclic learning rate scheduler dynamically adjusts the learning rate, enhancing training effectiveness.Additionally, the model incorporates two graph neural network layers (Graph Transformer and Graph Convolution Network), employing different neighbor sampling quantities during training.

Table 2
Performance of WMAGT and other compared methods on three benchmark datasetsThe bold indicates the best performing method on each metric

Table 3
The top 10 WMAGT-predicted candidate drugs for Parkinson's disease